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Abstract — We consider the synthesis of control policies from 
temporal logic specifications for robots that interact with 
multiple dynamic environment agents. Each environment agent 
is modeled by a Markov chain whereas the robot is modeled by 
a finite transition system (in the deterministic case) or Markov 
decision process (in the stochastic case). Existing results in 
probabilistic verification are adapted to solve the synthesis 
problem. To partially address the state explosion issue, we 
propose an incremental approach where only a small subset of 
environment agents is incorporated in the synthesis procedure 
initially and more agents are successively added until we hit the 
constraints on computational resources. Our algorithm runs in 
an anytime fashion where the probability that the robot satisfies 
its specification increases as the algorithm progresses. 

I. Introduction 

Temporal logics [1], [2], [3] have been recently employed 
to precisely express complex behaviors of robots. In partic- 
ular, given a robot specification expressed as a formula in 
a temporal logic, control policies that ensure or maximize 
the probability that the robot satisfies the specification can 
be automatically synthesized based on exhaustive exploration 
of the state space [4], [5], [6], [7], [8], [9], [10], [11], [12]. 
Consequently, the main limitation of existing approaches for 
synthesizing control policies from temporal logic specifica- 
tions is almost invariably due to a combinatorial blow up 
of the state space, commonly known as the state explosion 
problem. 

In many applications, robots need to interact with exter- 
nal, potentially dynamic agents, including human and other 
robots. As a result, the control policy synthesis problem 
becomes more computationally complex as more external 
agents are incorporated in the synthesis procedure. Consider, 
as an example, the problem where an autonomous vehicle 
needs to go through a pedestrian crossing while there are 
multiple pedestrians who are already at or approaching the 
crossing. The state space of the complete system (i.e., the 
vehicle and all the pedestrians) grows exponentially with the 
number of the pedestrians. Hence, given a limited budget of 
computational resources, solving the control policy synthesis 
problem with respect to temporal logic specifications may not 
be feasible when there are a large number of pedestrians. 

In this paper, we partially address the aforementioned issue 
and propose an algorithm for computing a robot control 
policy in an anytime manner. Our algorithm progressively 
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computes a sequence of control policies, taking into account 
only a small subset of the environment agents initially and 
successively adds more agents to the synthesis procedure in 
each iteration until the computational resource constraints 
are exceeded. As opposed to existing incremental synthesis 
approaches that handle temporal logic specifications where 
representative robot states are incrementally added to the 
synthesis procedure [8], we consider incrementally adding 
representative environment agents instead. 

The main contribution of this paper is twofold. First, we 
propose an anytime algorithm for synthesizing a control pol- 
icy for a robot interacting with multiple environment agents 
with the objective of maximizing the probability for the 
robot to satisfy a given temporal logic specification. Second, 
an incremental construction of various objects needed to be 
computed during the synthesis procedure is proposed. Such 
an incremental construction makes our anytime algorithm 
more efficient by avoiding unnecessary computation and 
exploiting the objects computed in the previous iteration. 
Experimental results show that not only we obtain a rea- 
sonable solution much faster, but we are also able to obtain 
an optimal solution faster than existing approaches. 

The rest of the paper is organized as follows: We provide 
useful definitions and descriptions of the formalisms in the 
following section. Section pi] is dedicated to the problem 



formulation. Section IV provides a complete solution to 
the control policy synthesis problem for robots that interact 
with environment agents. Incremental computation of control 



policies is discussed in Section M Section VI presents 
experimental results. Finally, SectionjVlIJconcludes the paper 
and discusses future work. 

II. Preliminaries 

We consider systems that comprise multiple (possibly 
stochastic) components. In this section, we define the for- 
malisms used in this paper to describe such systems and 
their desired properties. Throughout the paper, we let X*, 
X u and X + denote the set of finite, infinite and nonempty 
finite strings, respectively, of a set X. 

A. Automata 

Definition 1: A deterministic finite automaton (DFA) is a 
tuple A = (Q, S, S, q ini t,F) where 

• Q is a finite set of states, 

• £ is a finite set called alphabet, 

• £:QxI]— ^Qisa transition function, 

• qinit € Q is the initial state, and 

• F C Q is a set of final states. 



We use the relation notation, q — > q' to denote S(q, w) = q' . 
Consider a finite string a = ai<r 2 ■ ■ -a n <E X*. A run for 
a in a DFA A = (Q,Y,,5,qi n it,F) is a finite sequence of 
states q qi ■■■In such that q = q inlt and q -^» q\ -^> 
(j2 — ^> • • • — ^ q n - A run is accepting if q n E F. A string 
cr e E* is accepted by .4 if there is an accepting run of a 
in A. The language accepted by „4, denoted by C(A), is the 
set of all accepted strings of A. 

B. Linear Temporal Logic 

Linear temporal logic (LTL) is a branch of logic that can 
be used to reason about a time line. An LTL formula is built 
up from a set II of atomic propositions, the logic connectives 
-i, V, A and => and the temporal modal operators O 
("next"), □ ("always"), O ("eventually") and U ("until"). 
An LTL formula over a set II of atomic propositions is 
inductively defined as 

tp := True | p | -up \ tp A <p \ O f \ <f U tp 

where p € II. Other operators can be defined as follows: 
ip Aip = -i(-iy>V ~^ip), ip =>■ ip = -^f\/ip, Of = True U if, 
and Dip = -iO-xp, 

Semantics of LTL: LTL formulas are interpreted on infi- 
nite strings over 2 n . Let a = <7oCi<72 • ■ • where <7j G 2 n for 
all i > 0. The satisfaction relation |= is defined inductively 
on LTL formulas as follows: 

• a |= True, 

• for an atomic proposition p e II, a \= p if and only if 

V G <T0, 

• a |= -^ip if and only if a y= tp, 

• cr |= ip\ A tp2 if and only if a |= </?i and er |= </? 2 , 

• it |= 0<p if and only if <7i<72 • • • |= y, and 

• cr |= <^i W (,52 if and only if there exists j > such 
that Ojcrj+i ... |= (^2 and for all i such all < i < j, 
(Ji(j i+ i . . . \= ipi. 

More details on LTL can be found, e.g., in [1], [2], [3]. 

In this paper, we are particularly interested in a class of 
LTL known as co-safety formulas. An important property of 
a co-safety formula is that any word satisfying the formula 
has a finite good prefix, i.e., a finite prefix that cannot 
be extended to violate the formula. Specifically, given an 
alphabet S, a language L C E w is co-safety if and only 
if every w € L has a good prefix x € E* such that 
for all y e T, u , we have x ■ y E L. In general, the 
problem of determining whether an LTL formula is co- 
safety is PSPACE-complete [13]. However, there is a class 
of co-safety formulas, known as syntactically co-safe LTL 
formulas, which can be easily characterized. A syntactically 
co-safe LTL formula over II is an LTL formula over II whose 
only temporal operators are O, O and U when written in 
positive normal form where the negation operator -i occurs 
only in front of atomic propositions [3], [13]. It can be shown 
that for any syntactically co-safe formula if, there exists a 
DFA A v that accepts all and only words in pref(ip), i.e., 
C(A V ) = pref(f), where pref(ip) denote the set of all 
good prefixes for ip [9]. 



C. Systems and Control Policies 

We consider the case where each component of the system 
can be modeled by a deterministic finite transition system, 
Markov chain or Markov decision process, depending on the 
characteristics of that component. These different models are 
defined as follows. 

Definition 2: A deterministic finite transition system 
(DFTS) is a tuple T = (S, Act, — >, Siniti n, L) where 

• S is a finite set of states, 

• Act is a finite set of actions, 

• — >C S x Act x S is a transition relation such that 
for all s <G S and a € Act, \Post(s,a)\ < 1 where 
Post(s,a) = {s' e S | (s^a,^) e — >}, 

• Sinit € S is the initial state, 

• II is a set of atomic propositions, and 

• L : S — >• 2 n is a labeling function. 

(s,a,s') e — > is denoted by s — > s'. An action a is 

enabled in state s if and only if there exists s' such that 

a , i 
s — > s . 

Definition 3: A (discrete-time) Markov chain (MC) is a 

tuple Ai = (S,P, Unit, II, L) where S, II and L are defined 

as in DFTS and 

• P : S x S — >• [0, 1] is the transition probability function 
such that for any state s e S, 2^2 s 'es P( s ' s ') = 1> anc ' 

• Unit '■ S — > [0, 1] is the initial state distribution 
satisfying X) aeS i»mt(s) = 1. 

Definition 4: A Markov decision process (MDP) is a tuple 
M = (S,Act,P,ii n it,H,L) where S, Act, \ n iu n and L 
are defined as in DFTS and MC and P : S x Act x 5" ->• [0, 1] 
is the transition probability function such that for any state 
s G S and action a e Act, >~2 s , eS P(s, a, s') G {0, 1}. 

An action a is enabled in state s if and only if 
2~2 s > e s ■f > ( s ' a > s ') = 1- Let ^ c ^( s ) denote the set of enabled 
actions in s. 

Given a complete system as the composition of all its 
components, we are interested in computing a control policy 
for the system that optimizes certain objectives. We define a 
control policy for a system modeled by an MDP as follows. 

Definition 5: Let Ai — (S,Act,P,ii n it,H,L) be a 
Markov decision process. A control policy for X is a 
function C : S + —J- Act such that C(soSi . . . s n ) G Act(s n ) 
for all sqSi . . . s n € S + . 

Let M = {S,Act,P,i mlt ,U,L) be an MDP and C : 
S + — *• Act be a control policy for M.. Given an initial 
state s °f M such that 1^4 (s ) > 0, an infinite sequence 
= sqSi . . . on Ai generated under policy C is called 



r c _ 



a path on .M if P(si,C(soSi . . . si), Sj+i) > for all i. 
The subsequence sqSi . . . s n where n > is the prefix 
of length n of r^. We define Paths c M and FPaths c M 
as the set of all infinite paths of Ai under policy C and 
their finite prefixes, respectively, starting from any state s 
with ii n it(so) > 0. For sqSi . . . s n e FPaths c M , we let 
Paths c M (sqSi . . . s n ) denote the set of all paths in Paths c M 
with the prefix SqSi . . . s n . 

The cr-algebra associated with Ai under policy C is defined 
as the smallest cr-algebra that contains Paths c M (r c M ) where 



i M ranges over all finite paths in FPaths L M . 



It follows 



that there exists a unique probability measure Pr m on the 
(7— algebra associated with A4 under policy C where for any 
sqSi . . . s n € FPaths c M , 



¥r M {Paths^SQSx 



01 = 



Imit(so) Ilo<i<n P(Si,C(s Si . . . Si), S i+ i). 

Given an LTL formula ip, one can show that the set 
{soSi . . . <G Paths c M | L(so)L(si) ... \= ip} is measurable 
[3]. The probability for M. to satisfy ip under policy C is 
then defined as 

Pr^(^) = Pr c M {s si ... £ Paths']^ | L(s )L(si) ... \= p}. 

For a given (possibly noninitial) state s £ S, we let 
M s = (S, Act, P, if nit , n, L) where i? it (t) = 1 if s = i and 
*?»«(*) = ° otherwise. We define Pr d M {s \= ip) = Pr^ e (^) 
as the probability for Ai to satisfy </? under policy C, starting 
from s. 

A control policy essentially resolves all nondeterministic 
choices in an MDP and induces a Markov chain Aic tnat 
formalizes the behavior of M. under control policy C [3]. In 
general, Aic contains all the states in S + and hence may 
not be finite even though M. is finite. However, for a special 
case where C is memoryless, it can be shown that Aic can 
be identified with a finite MC. 

Definition 6: Let A4 = (S,Act,P,ii n i t ,H,L) be a 
Markov decision process. A control policy C on Ai is 
memoryless if and only if for each sequence soSi . . . s n 
and toti...t m £ S + with s n = t m , C(sqS\ . . . $ n ) = 
C(t ti . . .t m ). A memoryless control policy C can be de- 
scribed by a function C : S — > Act. 

III. Problem Formulation 

Consider a system that comprises the plant (e.g., the robot) 
and N independent environment agents. We assume that at 
any time instance, the state of the system, which incorporates 
the state of the plant and the environment agents, can be 
precisely observed. The system can regulate the state of the 
plant but has no control over the state of the environment 
agents. Hence, we do not distinguish between a control 
policy for the system and a control policy for the plant and 
refer to them as a control policy in general, as there is no 
confusion that in both cases, only the state of the plant can 
be regulated and both the system and the plant can precisely 
observe the current state of the complete system. Hence, even 
though a control policy may be implemented on the plant, it 
may be defined over the state of the complete system. 

We assume that each environment agent can be modeled 
by a finite Markov chain. Let Aii — (Si,Pi,ii n i t ,i,Hi, Li) 
be the model of the ith environment agent. The plant is 
modeled either by a deterministic finite transition system or 
by a finite Markov decision process, depending on whether 
each control action leads to a deterministic state transition. 
We use T to denote the model of the plant and let T = 
(So, Act, — >, Si n it.o,Ho, Lq) for the case where T is a 
DFTS and T = (Sq, Act,P ,ii nit .o,^io, Lq) for the case 
where T is an MDP. For the simplicity of the presentation, 



we assume that for all s £ Sq, Act(s) ^ 0. In addition, we 
assume that all the components T, VWi, .M2, ■ • ■ , -Mn in the 
system make a transition simultaneously, i.e., each of them 
makes a transition at every time step. 

Example 1: Consider a problem where an autonomous 
vehicle (plant) needs to go through a pedestrian crossing 
while there are N pedestrians (agents) who are already at or 
approaching the crossing. Suppose the road is discretized into 
a finite number of cells Co, C2, • • • , cm- The vehicle is mod- 
eled by either a DFTS T = (S ,Act, — >,s in itfl,U ,L Q ) 
or an MDP T = {So, Act, Po,iimt,o,no, Lq) whose state 
s £ So describes the cell occupied by the vehicle and whose 
action a 6 Act corresponds to a motion primitive of the 
vehicle (e.g., stop, accelerate, decelerate). If each motion 
primitive leads to a deterministic change in the vehicle's 
state, then T is a DFTS. Otherwise, T is an MDP. The 
motion of the ith pedestrian is modeled by an MC Aii — 
(Si,Pi,ii n it,i,Hi,Li) whose state s e Si describes the cell 
occupied by the ith pedestrian. The labeling function Li, i £ 
{0, . . . , TV} essentially maps each cell to its label, indexed 
by the agent ID, i.e., Lj(cj) = c) for all j e {0, . . . M}. 

Control Policy Synthesis Problem: Given a system 
model described by T, M.\, ■ ■ ■ , Mn and a syntactically co- 
safe LTL formula ip over IIo U III U . . . U Hn, we want to 
automatically synthesize a control policy that maximizes the 
probability for the system to satisfy <p. 

Example 2: Consider the autonomous vehicle problem 
described in Example [T] and the desired property stating 
that the vehicle does not collide with any pedestrian until 
it reaches cell cm (e.g., the other side of the pedestrian 
crossing). In this case, the specification p is given by <p — 
l -1 V 4 >ij>o( c i Ac j)j ^ c m- Using simple logic manipu- 
lation, it can be checked that ip is a co-safe LTL formula. 

IV. Control Policy Synthesis 

We employ existing results in probabilistic verification and 
consider the following 3 main steps to solve the control 
policy synthesis problem defined in Section [III] 

1) Compute the composition of all the system components 
to obtain the complete system. 

2) Construct the product MDP. 

3) Extract an optimal control policy for the product MDP. 

In this section, we describe these steps in more detail 
and discuss their connection to our control policy synthesis 



problem described in Section III 



A. Parallel Composition of System Components 

Assuming that all the components of the system make a 
transition simultaneously, we first construct the synchronous 
parallel composition of all the components to obtain the com- 
plete system. Synchronous parallel composition of different 
types of components is defined as follows. 

Definition 7: Let .Mi = (Si, Pi,ijmti,IIi, Li) and 
M 2 = (5 , 2,P2,ijmt,2,n 2 ,L 2 ) be Markov chains. Their 
synchronous parallel composition, denoted by A4i||A^2, is 
the MC M = (Si x S 2 , P, Unit, Hi un 2 ,L) where: 



• For each Si,s\ G Si and S2,s' 2 i 
P{{si,S2},{si,s' 2 })=Pi(si,s'i)P 2 (s2,s' 2 ). 

. For each si G Si and s 2 G S 2 , Unit({si,s 2 )) = 

Unit,l{si)Unit,2($2) ■ 

• For each si G Si and s 2 G S 2 , £((si, s 2 )) = L(si) U 
L(s 2 ). 

Definition 8: Let 71 = (Si, Act, — Y, Si„it, H\,Lx) be 
a deterministic finite transition system and M 2 — 
(S , 2,P2,iimt,2,n2,i2) be a Markov chain. Their syn- 
chronous parallel composition, denoted by T1WM2, is the 
MDP M = (Si X S 2 , Act, P, Unit, II i un 2 ,I) where: 

• For each si,s'i G Si, s 2 ,s' 2 G S 2 and a G Act, 
P({si,s 2 ),a,{s'i,s' 2 )) = P 2 (s 2 ,s 2 ) if si -^» si and 
P(( s i, s 2),a, ( s i, s 2 )) = otherwise. 

. For each s 2 G S 2 , Unit((sinit, s 2 )) = i imt , 2 (s 2 ) and 
Mnit((si,s 2 )) = for all s x € S\ {s lnit }. 

• For each si G Si and s 2 G S 2 , £((si, s 2 )) = L(si) U 

L(* 2 ). 

Definition 9: Let .Mi = (Si, Act, Pi, Unit,i,Ri,Li) bea 
Markov decision process and M 2 — (S 2 , P 2 , Unit, 2, II2, £ 2 ) 
be a Markov chain. Their synchronous parallel composi- 
tion, denoted by Mi\\M 2 , is the MDP M = (Si x 
S 2 , Act, P,ij„it,IIi LHI 2 ,I) where: 

• For each si,s[ G Si, s 2 ,s' 2 G S 2 and a G Act, 
P((si, s 2 ), a, (si, 4)) = Pi(si, a, si)P 2 (s 2 , s 2 ). 

. For each si G Si and s 2 G S 2 , iimt((si,s 2 )) = 

Unit,l{si)Unit,2(s 2 )- 

• For each si G Si and s 2 G S 2 , £((si, s 2 )) = £(si) U 
L(s 2 ). 

From the above definitions, our complete system can 
be modeled by the MDP T||.Mi|| ... ||Mjv, regardless of 
whether T is a DFTS or an MDP. We denote this MDP by 
M = (S,Act,P,Umt,Tl,L). 

B. Construction of Product MDP 

Let A v = (Q,2 n ,8,qi n it,F) be a DFA that recognizes 
the good prefixes of ip. Such A v can be automatically 
constructed using existing tools [14]. Our next step is to 
obtain a finite MDP M p = (S p ,Act p ,P p ,i p ^ nit ,Q,L p ) as 
the product of M and A v , defined as follows. 

Definition 10: Let M. = (S, Act, P, Unit, n, L) be an 
MDP and let A = [Q, 2 n , 8, q lnlt ,F) be a DFA. The product 
of M and A is the MDP M p = M ® A defined b>Q 
M p = (S p , Act, P p , ip,iniu n, L p ) where S p = S x Q and 
L p ((s,q)) = L(s). P p is defined as 



S 2 , intermediate transition probability function for A4 p . Finally, 



P p ((s,g),a,(s',</)) = 



P p ((s,q),a,(s',q')) 



ifq' = 6(q,L(s')) 
otherwise 

(1) 

where P p ((s, q), a, (s',q')) = P(s,a,s'). For the rest of 
the paper, we refer to P p : S p x Act x S p — Y [0,1] as the 

'We slightly modify the definition of atomic propositions and labeling 
function of the product MDP from the definition often used in literature to 
facilitate incremental construction of product MDP, which is explained in 
Section fV-Bl 



..,_) ip,init({s,q)) if q = S{q init ,L(s)) 

lp,init[[S, q) ) — 



otherwise 



(2) 



where i p ,init({s,q)) 
refer to i p ,i n u '■ S p - 
distribution for .M„. 



Unit(s)- For the rest of the paper, we 
[0, 1] as the intermediate initial state 



Stepping through the above definition shows that given 



a path 



M„ 



{s Q ,q }{si,qi) . . . on M p generated under 



some control policy C p , the corresponding path SoSi ... on 
M. generates a word L(so)L(si) . . . that satisfies ip if and 
only if there exists n > such that q n G F (and hence 
<7o<Zi ■ ■ -q-a is an accepting run on A v ), in which case we 



say that 



Mr 



is accepting. Therefore, each accepting path 



of Mp uniquely corresponds to a path of M whose word 
satisfies <p. In addition, a control policy C p on M p induces 
the corresponding control policy C on M. The details for 
generating C from C p can be found, e.g. in [3], [10]. 

Based on this argument, our control policy synthesis 



problem defined in Section III can be reduced to computing 
a control policy for M p that maximizes the probability of 
reaching a state in B p = {(s, q) G S p \ q G F}. 



C. Control Policy Synthesis for Product MDP 

For each s G S p , let x s denote the maximum probability 
of reaching a state in B p , starting from s. Formall, x s = 
sup c Pi'jvi (s |= OB p ), where, with an abuse of notation, 
B p in OBp is a proposition that is satisfied by all states 
in B p . There are two main techniques for computing the 
probability x s for each s G S p : linear programming (LP) and 
value iteration. LP -based techniques yield an exact solution 
but it typically does not scale as well as value iteration. 
On the other hand, value iteration is an iterative numerical 
technique. This method works by successively computing the 
probability vector (x s ) s gs f° r increasing k > such that 



„(*0 



„('» 



: x s for all s G S p . Initially, we set Xa=l 
if s G B p and x s = otherwise. In the (k + l)th iteration 
where k > 0, we set 



,(fc+i) _ 



1 
max 

aG Act p (s) 



if S G B v 



E 

tes„ 



Jk) 



P p (s,a,t)x t otherwise. 



(3) 
In practice, we terminate the computation and say 

Ik) 

that x s converges when a termination criterion such as 
maxjgs \x s " ' — x s , I < e is satisfied for some fixed 
(typically very small) threshold e. 

As discussed in [15], [16], decomposition of M p into 
strongly connected components (SCC) can help speed up 
value iteration. C C S p is an SCC of M p if there is a 
path in M p between any two states in C and C is maximal 
(i.e., there does not exist any C G S p such that C C C and 
C is an SCC). The algorithm proposed in [17] allows us to 
identify all the SCCs of M p with time and space complexity 
that is linear in the size of M p . 



The SCC-based value iteration works as follows. First, 
we set x s ' = 1 if s £ B p and x s ' — otherwise! 2 ] 
Next, we identify all the SCCs cf /lp ,...,C™ p of M p . 
From the definition of SCC, we get that cf 4 " n Cf 4 " = 
0, V« ^ j and [J, Cf 1 p = S p . For each SCC Cf p , we define 
Succ{cf ip ) Q S p \ Cf 4 " to be the set of all the immediate 
successors of states in C„ p that are not in C„- p . A (strict) 



tM, 






M, 



VW D 



can be defined 



such that CT' P ^vt P C- " if Succ{C? P )C\C J } p 7^ 0. (Note 



partial order, -^ , among C^ 

that from the definition of SCC and Succ, there cannot be 
cyclic dependency among SCCs; hence, such a partial order 
can always be defined.) 

An important property of SCCs and their partial order that 
we will exploit in the computation of the probability vector 
(x s ) se s is that the probability values of states in C i p 
can be affected only by the probability values of states in 
C i p and all C i p -<m p 0$ p . Thus, our next step is to 

, C m p such that 



iMv 



generate an order O p among C x p ,. 



C i p appears before C r 



"' if C^ p < Mv Cf p 



We can then process each SCC separately, according to the 
order in Mp , since the probability values of states in C\ 



M, 



,M, 



that appears after C t ' p in Jvlp cannot affect the probability 
values of states in C i p . Processing of SCC C i p terminates 



at the fcth iteration where all 



„(*) 



-,M 



m 



s € C i p converges. Let 
x s be the value to which xf converges. When processing 
C i p , we exploit the order in <0 Mp and existing values of 
Xt for all t £ Succ(C i p ) to determine the set of s £ C i p 
where x\ needs to be updated from x s ' . The formula 
in (Bjl with x t replaced by Xt for all t £ Succ(C t p ) can 
be used to update those x s ■ We refer the reader to [15], 
[16] for more details. 

Note that computation of an order Mp requires 0(|S p | 2 ) 
time. Thus, the pre-computation required by the SCC-based 
value iteration can be computationally expensive, unless all 
the SCCs of M p and an order €) Mp are provided a-priori. 
As a result, the SCC-based value iteration may require more 
computation time than the normal value iteration, if the pre- 
computation time is also taken into account. 

Once the vector (x s ) s ^s is computed, a memoryless 
control policy C such that for any s £ S p , Pr^j (s |= OB p ) = 
x s can be constructed as follows. For each state s 6 S p , 
let Act p nax (s) C Act p (s) be the set of actions such that 
for all a £ Act p aax {s), x s — J2tes P(s,a,t)xt. For each 
s £ S p with x s > 0, let ||s|| be the length of a shortest 
path from s to a state in B p , using only actions in Act™" 1 * . 
C(s) £ Act™ ax (s) for a state s £ S p \B p with x s > is 
then chosen such that P p (s,C(s),t) > for some t £ S p 
with ||£|| = ||s|| — 1. For a state s £ S p with x s — or a 
state s £ B p , C(s) £ Act p (s) can be chosen arbitrarily. 



2 In the original algorithm, all the states s £ S p with x 3 = 1 and all the 
states that cannot reach B p under any control policy need to be identified but 
it has been shown in [16] that this step is not necessary for the correctness 
of the algorithm. 



V. Incremental Computation of Control Policies 

Automatic synthesis described in the previous section 
suffers from the state explosion problem as the composition 
of T and all Mi, . . . , M.n needs to be constructed, leading 
to an exponential blow up of the state space. In this section, 
we propose an incremental synthesis approach where we 
progressively compute a sequence of control policies, taking 
into account only a small subset of the environment agents 
initially and successively add more agents to the synthesis 
procedure in each iteration until we hit the computational 
resource constraints. Hence, even though the complete syn- 
thesis problem cannot be solved due to the computational 
resource limitation, we can still obtain a reasonably good 
control policy. 

A. Overview of Incremental Computation of Control Policies 

Initially, we consider a small subset Mo C 
{M.\,...,M.n} of the environment agents. For each 
M t = (S , i,Pi,i miti i,n i ,L i ) £ M , we consider a 
simplified model Aii that essentially assumes that the 
ith environment agent is stationary (i.e., we take into 
account their presence but do not consider their full model). 
Formally, M % = ({si},Pj,ii n it,i,Ik,Li) where s t £ Si 
can be chosen arbitrarily, Pi(si,Si) = 1, iinit.i( s i) = 1 an d 
Li(si) = Li(si). Note that the choice of Si £ Si may affect 
the performance of our incremental synthesis algorithm; 
hence, it should be chosen such that it is the most likely state 
of Mi. We let M = {Mi | M % £ {Mi, . . . ,Mat}\Mo}. 

The composition of T, all Mi £ M and all Mj £ M is 
then constructed. We let M M ° be the MDP that represents 
such composition. Note that since Mi is typically smaller 
Mi, M M ° is typically much smaller than the composition 
of T, Mi,..., M N . We identify all the SCCs of M m ° 
and their partial order. Following the steps for synthesizing 
a control policy described in Section |IV| we construct 
Mf° = M m ° ®A V where A v = {0^,5,%^, F) is 
a DFA that recognizes the good prefixes of (p. We also 
store the intermediate transition probability function and the 
intermediate initial state distribution for M™" and denote 
these functions by P^ and i p ° nit , respectively. 

At the end of the initialization period (i.e., the Oth it- 
eration), we obtain a control policy C M ° that maximizes 
the probability for M m ° to satisfy <p. C M ° resolves all 
nondeterministic choices in M m ° and induces a Markov 
chain, which we denote by M„m q - 

Our algorithm then successively adds more full models of 
the rest of the environment agents to the synthesis procedure 
at each iteration. In the (k + l)th iteration where k > 
0, we consider M. k+1 = M fc U {Mi} for some Mi £ 
{Mi, . .. ,Mn}\ M fc . Such Mi may be picked such that 
the probability for M C ^ Q \\Mi to satisfy tp is the minimum 
among all M, £ {Mi, ...,Mn}\ Mfc. This probability 
can be efficiently computed using probabilistic verification 
[3]. (As an MC can be considered a special case of MDP 
with exactly one action enabled in each state, we can easily 
adapt the techniques for computing the probability vector of 



a product MDP described in Section IV-C to compute the 



probability that M^ a \\Mi satisfies ip.) We let M fe+1 = 
Mfe \ {Mi} and let M Mk+1 be the MDP that represents 
the composition of T, all Mi G M^ +1 and all Mj G 
Mfc+i. Next, we construct M™ k+1 = M mk+1 <g> A v and 
obtain a control policy C M '°+ 1 that maximizes the probability 
for M M - k + 1 to satisfy ip. Similar to the initialization step, 
during the construction of M p fc+1 , we store the intermediate 
transition probability function and the intermediate initial 
state distribution for M p fc+1 and denote these functions by 
P™ fe+1 and iJS+J, respectively. 



The process outlined in the previous paragraph terminates 
at the K\h iteration where Mk = {Mi, . . . ,Mn} or 
when the computational resource constraints are exceeded. 
To make this process more efficient, we avoid unnecessary 
computation and exploit the objects computed in the previous 
itera tion. Consider an arbitrary iteration k > 0. In Section 



V-B 



we show how M 



M. 



pM fc + i 



and iJSSt can be 
incrementally constructed from Ad™ 1 ", Pp* 1 * and i pi k nit - 
Hence, we can avoid computing A^ Mfc+1 . In addition, as 
previously discussed in Section IV-C generating an order 
of SCCs can be computationally expensive. Hence, we 
only compute the SCCs and their order for .M Mo and all 
Mj € {Mi,... , Mn} \ M , which are typically small. 
Incremental construction of SCCs of .M Mfc+1 an d thei r order 
from those of _A/. Mfc is considered in Section V-C 
that we do not compute M JsAk 



(Note 
but only maintain its SCCs 
and their order, which are incrementally constructed using 



the results from the previous iteration.) Finally, Section V-D 
describes computation of C Mfc , using a method adapted from 
SCC -based value iteration where we avoid having to identify 
the SCCs of Mp^ k and their order. Instead, we exploit the 
SCCs of .M Mfc and their order, which can be incrementally 



constructed using the approach described in Section V-C 



B. Incremental Construction of Product MDP 

For an iteration k > 0, let M fc+1 = M fc U {Mi} 
for some Mi € {Mi, ■ . ■ , Mn} \ M&. In general, one 
can construct M P k+1 by first computing _A/. Mfc+1 , which 
requires taking the composition of a DFTS or an MDP with 
N MCs, and then constructing M NLk+1 <8> A v . To accelerate 
the process of computing M p t+1 ,we exploit the presence of 
Mp k , its intermediate transition probability function P Mfc 
and intermediate initial state distribution L k nit , which are 
computed in the previous iteration. 

First, note that a state s p of M p Ak is of the form s p — 
(s,q) where s — (s , s\, . . . , s N ) G S x Si x . . . x Sn 
and q G Q. For s = (sq, si, • ■ ■ , s_v) £ SoxSiX...x 
Sn, i G {0, ...,N} and r G Si, we define s\i^ r = 
(so,...,Si_i,r,Sj +1 ,...,sjv}, i-e., s|^ r . is obtained by 
replacing the zth element of s by r. 

Lemma 1: Consider an arbitrary iteration k > 0. Let 
M fc+1 = M fc U {Mi} where Mi G {Ml, . . . , M N } \ M fe . 
Suppose M M * = {S™ k ,Actf k ,?f k ,i™ k mt ,IYf k ,Lf k ) 
and Mi = (Si,Pi,i init> i,Ui,Li). Assuming that for 
any i,j G {0,...,N}, U t n n,- = 0, then M™ k+1 = 
(5 P M fc+1 , Actf"^ , P^ +1 , %££ > n p Mfc , Lf k ^ ) where 



nM H i 
Dp 

Actf k ^ 
s = (s , 
q, q' e Q, 



{(*|n-r.«) \{s,q) G S™ k andr G S l }, 



Actf k 



rip 



lip k , and for any 
,s N ),s' = (s' ,...,s' N ) £ S x ...S N and 



. P^ +1 ((s,g),a,(s',g')) 



pM fc + 1 , 



-M fc4 



«s'y») 



! ((s,g),a,(s',g')) 

if g' = S(q, Lp' 
otherwise 

where the intermediate transition probability function 
is given by 






Pf k+l ({ S ,q),a,{ S ',q')) 



(4) 



P.(*i.*0Pj 4 *«S.ff>.a l <S'.«'>) 

for any (s, g), (s',q') G S' p VIfc such that s|.<_ Si = s and 
s'|i<_ s ; = s', 



& «»>«» = 



%£il(M) 



-M fc , 



ifg-5(g mlt ,Lr +1 (( S ,g))) 
otherwise 
where the intermediate initial state distribution is given 

by 

Wntt(( s '9>) = hnit,i(si)ip% it ({s,q)) (5) 

for any (s,q) G Sp h such that s\i^- si = s, and 
L™ k+1 ((s,q)) = (i M *(M) \ Li(Sj)) U L,(«,) for 



any (s, q) G S£* fc such that s 
Proof: The correctness of S 



Mm 



M fc4 



Aci^ fc+1 , n M fc+1 



p is straightforward to verify. Hence, we will only 



and i 

provide the proof for the correctness of P p k+1 and P D k+1 . 

The correctness of l 



M, 



~M fc4 



1 and I 4tt can be proved in a 



p,init 



similar way. 

Consider an arbitrary iteration k > and let A / t Mfc 

(S ,M *,^cf M SP M *,i^*,n M ^,i M *) and M Mk ^ 

(s Mk +i , Ac^^ 1 , p m '+i , i^ +1 , n Mfc +! , L Mfc +! ). 

is obvious from the definition of product MDP that 



It 



t-.Mh.j_i . , iMi 

P p is correct as long as P p 

Pp^ +1 (( S ,g),a,( S ',g')) = ^ M " 



is correct, i.e., 

pMfe+i ( S , a, s') for all 

(s,q), (s',q') G S™ k+1 and a G Act M fc+1 



■i=.M 



;j 



L . Hence, we only 



need to prove the correctness of P p 
Assume that P Mfc is 



>m, 



((s,q),a, (s',q')) 



pM fc 
P 



correct, 



i.e., 



>M, 



(s, a, s') for all 



(s,q),(s',q') G S™ h and a G Act Mfc . Let . be the 
index such that M^+i = Mj. U {A^;}. Consider arbitrary 

' Mfc+1 . Suppose 



(s',q') 



c-Mfc + i 
Dp 



and a G 



J\CXrj 



(s,q) 

s = (sq, . . . , s_v) an d s' = (s' , . . . , s' N ). Note that since 
jVfi only contains one state, there exists exactly one 
(s,q) G S™ k and exactly one (s',q') G S™ k such that 
s|;<_ S( = s and s'|/^ s ' = s'. Since M^ is the composition 
of T, all A^,: G M k and all jVfj G M k and since 7U; G" M fc 
and Pj(-, •) = 1, it follows that if T is a DFTS, then 

P M *(5,a,S') = j t e{i N}\ { i } 
and if T is an MDP, then 



if so — > s 
otherwise 



>M fc /~ 



(s, a,s') =P (s ,a, s ) J| P^s^s-). 



.G{1,...,JV}\{.} 



Thus, P Mfc +!(s,a,s') = P l (s h s' l )P m, '(s,a,s'). Com- 
bining this with (HI), we get 

= P l (si,si)P^((~s,q),a,(S',q')) 
= P l ( Sl ,s' l )P M <°(s,a,s') 
= P Mfc + 1 (s,a,s'). 
By definition, we can conclude that P p fc+1 is correct. ■ 

C. Incremental Construction of SCCs 

Consider an arbitrary iteration k > 0. Let I be the index 
of the environment agent such that Mfc + i = M^ U{Mi}. In 
this section, we first provide a way to incrementally identify 
all the SCCs of M m " +1 from all the SCCs of M Mk and 
Mi- We conclude the section with incremental construction 
of the partial order over the SCCs of M Mk+1 from the partial 
order defined over the SCCs of M M ~ k and Mi. 

Lemma 2: Let C Mfc be an SCC of M Mk and C l be an 
SCC of Mi where M fc+ i = M fc U {Mi}. Suppose either of 
the following conditions holds: 

Cond 1: |C Mfc | = 1 and the state in C Mfe does not have 
a self-loop in M wlk . 

Cond 2: \C l \ = 1 and the state in C l does not have a 
self-loop in Mi. 

Then, for any s e C Mfc and r e C\ {s\i^ r } is an SCC of 
M Mk+1 . Otherwise, {s\i^ r | s e C m « ,r e C 1 } is an SCC 
ofM Mk+1 . 

Proof: First, we consider the case where Cond 1 
or Cond 2 holds and consider arbitrary s € C Mfc and 
r e C l . To show that {s\i^ r } is an SCC of M Mk+1 , 
we will show that there is no path from s\i<- r to itself in 
M Mk+1 . Since condition (1) or condition (2) holds, either 
there is no path from s to itself in .M Mfc or there is 
no path from r to itself in C l . Assume, by contradiction, 
that there is a path from s\i^ r to itself in M Mk+1 . Let 
this path be s\i^ r , s 1 , s 2 , . . . , s n , s\i^ r where for each i £ 
{1, . . . , n}, s % = {sq, . . . , s l N ). From the proof of Lemma [I] 
we get that P Mk + 1 (s\i^ r ,a, s 1 ) =Pi(r,sj)P Mh (s,a,S*), 
P M *+i(s n ,a,s\i^ r ) = Pi(sf,r)P M <°(s n ,a,s) and 
P Mk +i{s\a,s l+1 ) = Pi(sl,s 1 l +l )P Mk (s i ,a,s i+l ) for all 
a e Act Mfe+1 where for each i <E {l,...,n}, s* G S Mk 
such that s l \i^ s i = s\ 



Since s\i^ r , s 1 , s 2 , . . . , s n , s\i^ r is a path in 
M Mk+1 , there exist a ,...,a n <E Act Mk + 1 such 
that P^^^l^^ao^ 1 ), P M ^(s n ,a n ,s\i^r), 
P M »+ 1 (s i ,a i ,8 i+1 ) > for all i e {l,...,n}. Thus, it 
must be the case that P;(r, sj), P t (sf,r), P;(sj,s^ +1 ) > 
and P Mk {s,a,s 1 ), P M "(s n ,a,s), P Mk {s\a,S l+1 ) > 
for all i <G {1, . . . , n}. But then, r, sj, . . . , s™, r is a path 
in C and s, s 1 , . . . , s™, s is a path in .M Mfc , leading to a 
contradiction. 

Next, consider the case where both Cond 1 and Cond 2 
do not hold. To show that C Mfc +! = {s\i^ r \ s e C Mk ,r€ 
C 1 } is an SCC of M Nlk+1 , we need to show that for any 
s,s e C Mfc+1 and any s' <£ C Mfc+1 , (1) there is a path in 
M Mk+1 from s to s, and (2) there is no path in M Mk+1 



either from s to s' or from s' to s. Both of these statements 
can be proved by contradiction, using the same reasoning as 
in the proof above for the case where either Cond 1 or Cond 
2 holds. ■ 

We say that an SCC C Mfc+1 of M Mk+1 is derived from 
(C Mk ,C l ), where C Mfc is an SCC of M m " and C l is an 
SCC of Mi, if C™^ 1 is constructed from C Mfc and C l 
according to Lemma i.e., C Mfc+1 = {s\i^ r } for some 
s € C Mfc and r G (7 if Cond 1 or Cond 2 in Lemma H 
holds; otherwise, C Mfc +! = {s| ;< _ r | s € C M ',r£ C'}. 

Lemma 3: For each SCC C M *+i of M Mk+1 , there exists 
a unique (C k ) C l ) from which (7 M fc+i is derived. 

Proof: Similar to Lemma [T] it can be checked that 
gMk+i _ { s |;^ r | s e 5- M ^ an( j r e 5;} is the set of states 
of M mk+1 . Consider an arbitrary SCC C™^ 1 of M Mk+1 
and an arbitrary s = (so, . . . , sat) G C Mfc+1 . 

By definition, for any arbitrary SCC C Mfe of M Mk and 
arbitrary SCC C l of X/, C 1 ^^ 1 is derived from (C* Mfc , C l ) 
only if s; € C l and there exist s' e (7 M fc suc h that 
s '|/<-s; = s. But since 7W; contains exactly one state, there 
exists a unique s' e S' Mfc such that s'|z^ s , = s. Also, from 
the definition of SCC, there exist a unique SCC C Mh of 
7W Mfc and a unique SCC C' of Mi such that s' G C Mfc 
and s; £ C l . Thus, it cannot be the case that C ,Mfc+1 is 
derived from (C Mk ,C l ) where C rMfc ^ C Mfc or C*' ^ C l . 
Applying Lemma pi we get that there exists an SCC C Mk+1 
of M Mk+1 that is derived from (C Mk ,C l ) and contains s. 
Since s e C Mk+1 and s e C Mfc + 1 , from the definition of 
SCC, it must be the case that C Mk ^ = C' Mfc + 1 ; thus, C Mfc +! 
must be derived from (C Mfc , C'}. ■ 

Lemma [2] and Lemma [3] provide a way to generate all the 
SCCs of M m " +1 from all the SCCs of M m " and Mi as 
formally stated below. 

Corollary 1: The set of all the SCCs of M Mk+1 is given 
by 

{ C w k+1 derived from (C Mk ,C l ) | 

C Mfc is an SCC of M Mk and C l is an SCC of Mi}. 

Finally, in the following lemma, we provide a necessary 

condition, based on the partial order over the SCCs of M Mk 

and Mi, for the existence of the partial order between two 

SCCs of M mk+1 . 

Lemma 4: Let C™ k+1 and Cf fc+1 be SCCs of M Mk +K 



-,M 



Suppose Ci k+1 is derived from (C^ k ,C[) and C 



,M,. 



is derived from (C\ * k ,C l 2 ) where Cf 1 * and C2 fc are 
SCCs of M mk and Cj and C\ are SCCs of 7W ; . Then, 



M fc + 1 ^ ,. c M fc + 1 on j y . f c M k , ..M. 



-<, 



C[ <Mi C 



^ 



M M k 



-,M fc 



= '-'9 



and 



Proof: Consider the case where C x 



^TK^'fe + i 

C 2 Mfc+1 . By definition, Succ{Cf k+1 ) n C, Mfc+1 ^ 0. Con- 



sider a state s' = (sL . . . , s^) G S'MCcCCa fc+1 ) n C™ * +1 . 
Since s' e Succ(C 2 k+1 ), there exists s = (so,...,Sjv) € 
C " 



Ms 



and a e Act™"* 1 such that P M *+i ( s , a, s') > 
0. But from the proof of Lemma [I] P Mfc+1 (s,a, s') = 
P/(sz, sJ)P Mfc (s,a, §') where s and s' are unique states in 
S ,Mfc such that s|/^ S ; = s and s'\i^ s ' — s' . Thus, it must 
be the case that P/(s;,sJ) > and P Mfc (s,a, §') > 0. 



In addition, since C x k+1 is derived from (C\ k ,C l i ) and 



C™ k+1 js derived from (C. 



,C 2 ), from Lemma u\ and 



Lemma |3| it must be the case that i G C™ k , s' G Cf 1 * 



Mi 2 -/ 



si G C 2 and sj € C{. Since s G C; 

P Mfe (s, a, s') > 0, we can conclude that s' G Succ{C% lh )C\ 



G C™ k and 

M fe N 



Cj fc , and therefore, by definition, C x 



Mi 



-< 



.VT 



C 



M; 



Similarly, since s/ G Cj, sj G C{ and P;(s;,sJ) > 0, we 
can conclude that s\ G Succ(C 2 ) fl C{, and therefore, by 
definition, C{ ^./vi; C^. ■ 



CO, 



"Da 






►C2 



C3 



C4 



Fig. 1. The road and its partition used in the autonomous vehicle example. 



D. Computation of Probability Vector and Control Policy for 
M™ k from SCCs of M m " 

Consider an arbitrary iteration k > 

and the associated product MDP M™ k = 

(S^,Act^,P^,i^ ip U^,L^). Similar to the 
SCC -based value iteration, we want to generate a 
partition {D™f , . . . , D™* k } of S™ k with a partial 



order < M ™ k such that D pj 



M/. £jMfc +1 y, 






p,i 



iM 



Succ{D k ) n Dp fe 7^ 0. However, we relax the condition 



that each £>pj,i G {1, . . . ,m k } is an SCC of Mf» and 
only require that if D k contains a state in an SCC C^ k of 
Aip k , then it has to contain all the states in C^ k . Hence, 
D Mfc may include all the states in multiple SCCs of M™ k . 
The following lemmas provide a method for constructing 
{D Mfe , . . . , D™ k } and their partial order from SCCs of 
_A/f Mfc and their partial order, which can be incrementally 
constructed as described in Section IV-CI 

Lemma 5: Let C™ k be an SCC of M™ k . Then, there 
exists a unique SCC C m « of M m " such that C™ k C 
C m " x Q. 

Proof: This follows from the definition of product MDP 
that for any s, s' G S' Mfc and q, q' G Q, there is a path from 
(s, q) to (s', g') in A4™ k only if there is a path from s to s' 
inM Mk . ■ 

Lemma 6: Let C™ k and (7™* be SCCs of M™ k . Sup- 
pose C Mk and C* Mfc are unique SCCs of M m " such 
that C™ k C C Mfc x Q and C*™* C C* Mfc x Q. Then, 

C p Mfe ^M M * ^ fc ° nly if CMfc ^ M * dMk - 

Proof: This follows from the definition of product MDP 

since for any (s,q) G Sp k , (s,q) G S^ 1 * is a successor of 

(s, (?) in A / f? /Ifc only if s is a successor of s in A^ Mfc . ■ 

Lemma 7: Let C? 4 * , . . . , C™ fc be all the SCCs of M Mk 

and for each i G {1, . . . , m k }, let D^* = C t Mfc x Q. Then, 

{£>™ fc , . . • , D$h k } is a partition ofS™*. In addition, the 

following statements hold for all i, j G {1, . . . , mfc}. 

. If D™ fc contains a state in an SCC C¥~ k of AL Mfc , then 

p.i p p 

it contains all the states in C^ lk . 
. Succ(D^) n DJJ» / only if Cf k < M ^ k C™\ 

Proof: Consider arbitrary i,j G {1, . . . , ink}- It follows 
directly from Lemma 5 that if D k contains a state in an 
SCC C™ k of M™ k , then it contains all the states in C™ k . 



Next, consider the case where Succ{D^ k ) n D%j ^ 0. 
Then, from Lemma |J there exist SCCs C™ k C D™ k and 

C™ k C D^ 1 / of Mf * such that 5 U cc(C^ fc ) n C^ ^ 



M t 



0. Thus, c; 7 ^m, 



c 



' k . Applying Lemma 6 we get 



~-M M k 



Applying Lemma 17] we generate a partition 
{Dffi,...,D™* h } of S™ h where for each 
i G '{1, . . . , m k ], D% = C™ k x Q and ^ Mfc , . . . , C*ft 
are all the SCCs of A^ Mt] . A partial order ^^m^ over 



this partition is defined such that D 

^ £fM fc j _ ^,M fc 
pM fc 
~-M fc 



m, D M * +I 



^M^ 



~4 M M fc C 



Hence, an order 



iiM fc 



j -Ai M fc ^i ■ '«="^. a » " iuci ^ P " among 

D j*,..., D^* can be simply derived from the order of 
Cj fc , . . . , C^ k , which can be incrementally constructed 
based on Lemma SI This order 0^ fc has the property 

_ ._. ' — ' _ . _ Ayr. 



iM* 



that the probability values of states in D k that appears 
after D„ ..* in Oj^ fc cannot affect the probability values of 



p,z 



iM t 



V 



states in D„ k . Hence, we can follow the SCC -based value 

p. I 

iteration and process each D k separately, according to 
the order in O^ 1 * to compute the probability x s for all 
s G D k . Finally, we generate a memoryless control policy 
C Mfe from the probability vector (x s ) aq ™ k as described at 



the end of Section IPVI 



'ses„ 



VI. Experimental Results 

Consider, once again, the autonomous vehicle problem 
described in Example [T] and Example [2] Suppose the road is 
discretized into 5 cells cq, . . . , C4 where c 2 is the pedestrian 
crossing area as shown in Figure [T] The vehicle starts in cell 
c and has to reach cell C4. There are 5 pedestrians, modeled 
by MCs Mi,...,M.5, initially at cell c\. The models of 
the vehicle and the pedestrians are shown in Figure [2] A 
DFA A v that accepts all and only words in pref(ip) where 

<p = (-. Vi>i,j>o( c ? A c ))) U c 4 is shown in Figure 3 

First, we apply the LP-based, value iteration and !SCC- 
based value iteration techniques described in Section IV to 



synthesize a control policy that maximizes the probability 
that the complete system M = 7"||Mi||A^2|| • •• ||^5 
satisfies (p. The time required for each step of computation 
is summarized in Table II] All the approaches yield the 
probability of 0.8 that M. satisfies ip under the synthesized 
control policy. The comparison of the total computation time 
required for these different approaches is shown in Figure 
[4] As discussed in Section IV-C although the SCC -based 
value iteration itself takes significantly less computation time 
than the LP-based technique or value iteration, the time 
spent in identifying SCCs and their order renders the total 
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Fig. 4. Comparison of the computation time and the probability for the 
system to satisfy the specification computed using different techniques. 
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computation time of the SCC-based value iteration more than 
the other two approaches. 



Technique 


M p 


SCCs 
& order 
of M p 


Prob 

vector 


Control 
policy 


Total 


LP 


156.3 


- 


8.8 


6.8 


171.9 


Value 
iteration 


156.3 


- 


31.3 


6.8 


194.4 


SCC-based 
value iteration 


156.3 


71.1 


1.9 


6.8 


236.1 



TABLE I 

Time required (in seconds) for computing various objects 

using different techniques when the full models of all the 

environment agents are considered. 



Next, we apply the incremental technique where we 
progressively compute a sequence of control policies as 
more agents are added to the synthesis procedure in each 
iteration as described in Section [V] We let Mo = 0, Mi = 
{Mi}, M 2 = {M U M 2 },..., M 6 = {Mi, . . . , Ms}, 
i.e., we successively add each pedestrian .Mi, .M2, ■ • ■ , -M5, 
respectively, in each iteration. We consider 2 cases: (1) no 
incremental construction of various objects is employed (i.e., 
when M Mk+1 and M p fc+1 , k > are computed from 
scratch in every iteration), and (2) incremental construction 



applied. For the first case, we apply the LP-based technique 
to compute the probability vector as it has been shown to be 
the fastest technique when applied to this problem, taking 
into account the required pre-computation, which needs to 
be done in every iteration. For both cases, 6 control policies 
C M °, . . . ,C M5 are generated for M m ° , . . . , M M \ respec- 
tively. For each policy C Mfc , we compute the probability 
Pr^, (1/3) that the complete system M. satisfies ip under 
policy C Mfc . (Note that C Mfc , when applied to M, is only 
a function of states of Aii £ M^ since it assumes that 
the other agents Aij ^ M& are stationary.) These proba- 

0.46, 
(</>) = 0.67 and 



bilities are given by Prjvj (</?) 

Pr<L ((f) = 0.57, Pr^, ' {<p) = 0.63, Pr c 



0.08, Pf M 1 (if) 



Pr^ 5 M = 0.8 
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of various objects as described in Section V-B V-D is 



The comparison of the cases where the incremental con- 
struction of various objects is not and is employed is shown 
in Figure [4] A jump in the probability occurs each time a 
new control policy is computed. The time spent during each 
step of computation is summarized in Table III] and Table III 
for the first and the second case, respectively. Notice that the 
time required for identifying the SCCs and their order when 
the incremental approach is applied is significantly less than 
when the full model of all the pedestrians is considered in 
one shot since A / J M ° , Ai±, ■ ■ . , M5, each of which contains 
3 states, are much smaller than M p , which contains 2187 
states. 

From Figure HI our incremental approach is able to obtain 
an optimal control policy faster than any other techniques. 
This is mainly due to the efficiency of our incremental 
construction of SCCs and their order. In addition, we are 
able to obtain a reasonable solution, with 0.67 probability 
of satisfying ip, within 12 seconds while the maximum 
probability of satisfying <p is 0.8, which requires 160 seconds 
of computation (or 171.9 seconds without employing the 
incremental approach). 
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M m k 


M p 


Prob 

vector 


Control 
policy 


Total 





0.0064 


0.0185 


0.0464 


0.0084 


0.08 


1 


0.0123 


0.0762 


0.0203 


0.0104 


0.12 


2 


0.0154 


0.3383 


0.0231 


0.0296 


0.41 


3 


0.0357 


1.7055 


0.0542 


0.1503 


1.95 


4 


0.1393 


9.1950 


0.2155 


0.7975 


10.35 


5 


3.1836 


152.86 


8.2302 


6.8938 


171.17 



TABLE II 

Time required (in seconds) for computing various objects in 

each iteration when incremental construction is not applied. 



Iter- 
ation 


M M-a 


SCCs & order 
of M M °, 

Mi,..., Mb 


M^, 
partition 
& order 


Prob 

vector 


Control 
policy 


Total 





0.0055 


0.0043 


0.0203 


0.0112 


0.0036 


0.04 


1 


- 


- 


0.0726 


0.0102 


0.0087 


0.09 


2 


- 


- 


0.3239 


0.0193 


0.0282 


0.37 


3 


- 


- 


1.6036 


0.0567 


0.1424 


1.80 


4 


- 


- 


8.6955 


0.1876 


0.7755 


9.66 


5 


- 


- 


139.27 


1.6122 


7.0125 


147.89 



TABLE III 

Time required (in seconds) for computing various objects in 

each iteration when incremental construction is applied. 



VII. Conclusions and Future Work 

An anytime algorithm for synthesizing a control policy for 
a robot interacting with multiple environment agents with 
the objective of maximizing the probability for the robot to 
satisfy a given temporal logic specification was proposed. 
Each environment agent is modeled by a Markov chain 
whereas the robot is modeled by a finite transition system 
(in the deterministic case) or Markov decision process (in 
the stochastic case). The proposed algorithm progressively 
computes a sequence of control policies, taking into account 
only a small subset of the environment agents initially and 
successively adding more agents to the synthesis procedure 
in each iteration until we hit the constraints on computational 
resources. Incremental construction of various objects needed 
to be computed during the synthesis procedure was proposed. 
Experimental results showed that not only we obtain a 
reasonable solution much faster than existing approaches, 
but we are also able to obtain an optimal solution faster 
than existing approaches. 

Future work includes extending the algorithm to handle 
full LTL specifications. This direction appears to be promis- 
ing because the remaining step is only to incrementally 
construct accepting maximal end components of an MDP. 
We are also examining an effective approach to determine an 
agent to be added in each iteration. As mentioned in Section 
|V-A| such an agent may be picked based on the result from 
probabilistic verification but this comes at the extra cost of 
adding the verification phase. 
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